Distributed Spectral Dimensionality Reduction for Visualizing Textual Data

نویسنده

  • Sanjay Krishnan
چکیده

We use a Spectral Clustering model to formulate a distributed implementation using SPARK of Laplacian Eigenmaps that we call Distributed Spectral Dimensionality Reduction (DSDR). We evaluate DSDR to visualize conceptual clusters of terms in textual data from 2149 short documents written by online contributors to a State Department website. We compare DSDR with PCA, MultiDimensional Scaling, ISOMAP, and Locally Linear Embedding based on the Dunn Separation Index and computation times. We find for this dataset that DSDR is faster and better preserves high-dimensional cluster structure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

انجام یک مرحله پیش پردازش قبل از مرحله استخراج ویژگی در طبقه بندی داده های تصاویر ابر طیفی

Hyperspectral data potentially contain more information than multispectral data because of their higher spectral resolution. However, the stochastic data analysis approaches that have been successfully applied to multispectral data are not as effective for hyperspectral data as well. Various investigations indicate that the key problem that causes poor performance in the stochastic approaches t...

متن کامل

Dimensionality Reduction and Latent Variable Models for Online Collective Intelligence Systems

We explore dimensionality reduction and latent variable inference problems in an online Collective Intelligence system, Opinion Space. We propose a new dimensionality reduction algorithm derived from a spectral clustering model called Distributed Spectral Dimensionality Reduction (DSDR) and implemented it on the SPARK distributed platform. We applied our algorithm to visualize a 2D graphical ma...

متن کامل

Compressed Spectral Regression for Efficient Nonlinear Dimensionality Reduction

Spectral dimensionality reduction methods have recently emerged as powerful tools for various applications in pattern recognition, data mining and computer vision. These methods use information contained in the eigenvectors of a data affinity (i.e., item-item similarity) matrix to reveal the low dimensional structure of the high dimensional data. One of the limitations of various spectral dimen...

متن کامل

Visualizing and Exploring Dynamic High-Dimensional Datasets with LION-tSNE

T-distributed stochastic neighbor embedding (tSNE) is a popular and prize-winning approach for dimensionality reduction and visualizing highdimensional data. However, tSNE is non-parametric: once visualization is built, tSNE is not designed to incorporate additional data into existing representation. It highly limits the applicability of tSNE to the scenarios where data are added or updated ove...

متن کامل

Nonlinear dimensionality reduction: Alternative ordination approaches for extracting and visualizing biodiversity patterns in tropical montane forest vegetation data

Nonlinear dimensionality reduction: Alternative ordination approaches for extracting and visualizing biodiversity patterns in tropical montane forest vegetation data Miguel D. Mahecha⁎, Alfredo Martínez, Gunnar Lischeid, Erwin Beck Ecological Modelling, Bayreuth Centre for Ecology and Ecosystem Research BayCEER, University of Bayreuth, 95440 Bayreuth, Germany Max Planck Institute for Biogeochem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013